Heavy- Tail Distribution from Correlation of Discrete Stochastic Process 
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We propose a stochastic process driven by the memory effect with novel distributions which in- 
clude both exponential and leptokurtic heavy-tailed distributions. A class of the distributions is 
analytically derived from the continuum limit of the discrete binary process with the renormalized 
auto- correlation . The moment generating function with a closed form is obtained, thus the cumu- 
lants are calculated and shown to be convergent. The other class of the distributions is numerically 
investigated. The combination of the two stochastic processes of memory with different signs under 
regime switching mechanism does result in behaviors of power-law decay. Therefore we claim that 
memory is the alternative origin of heavy-tail. 
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Introduction . - Non-Gaussian distributions can be ob- 
served everywhere in nature. They are prevalent in con- 
densed matter systems and soft matter systems such as 
crystal growth, polymer transportation, and polymer dis- 
tribution [l|, y. They are also observed in astrophysics, 
for instance, galaxy mass and velocity distributions [3|,|j]. 
Leptokurtosis, skewness, and the power-law tail are char- 
acteristics of the social science data such as financial 
time series [5|, |6[ and complexities [8[ . Among the non- 
Gaussianities, the most notable features are the power- 
law distribution and the auto-correlation, which are often 
referred to as heavy-tail and memory. 

Heavy-tail is the statistical phenomenon which pre- 
dicts extreme values more frequently than the conven- 
tional Gaussian law. The Levy model 15|-|7| , derived from 
the Poisson jump diffusion process, has been used to de- 
scribe this feature. The Levy distribution demonstrates 
excellent fits to various kinds of real data. Moreover the 
jump diffusion is Markovian, hence the Levy model has 
an advantage in the use of Ito calculus [9|. However, 
this Markovian non-Gaussian model cannot explain non- 
Markovian behaviors. Memory is the representative non- 
Markovian quantity which is often measured by the auto- 
correlation of a stochastic process. Burst and clustering 
are the typical phenomena which are driven by memory. 
Recently the heavy-tail statistics in terms of correlation 
has been studied in the context of physics [10| . Fractional 
Brownian motion [Uj (FBM) is one of the most popular 
mathematical theories in dealing with the memory effect. 
The parameter of FBM is called the Hurst coefficient [12| 
which is defined as the exponent of the power-law corre- 
lation function of the two white noises at different times. 
While FBM has an obvious advantage on the issue of 
memory, there are three drawbacks. First, conventional 
stochastic calculus is not applicable to FBM due to the 
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very non-Markovian nature, therefore little is known for 
its analytic properties. Second the inner dynamics of the 
Hurst parameter is unclear. Third, heavy-tail distribu- 
tion cannot be constructed by FBM. 

In this letter, we propose a new micro process driven by 
memory that results in the heavy-tail distribution. This 
non-Markovian non-Gaussian process not only goes be- 
yond the Poisson jump diffusion process, which is the 
only known heavy-tail micro process so far, but also 
overcomes the three drawbacks of FBM in that it is an 
analytically-solvable model. 

Stochastic process with memory. - We design the 
simplest possible stochastic process with both non- 
Markovian and non-Gaussian properties. For such pur- 
pose, we implement a binomial process and assume zero- 
skewness for simplicity. We require that the single recur- 
rence rule should define the whole process and that the 
number of parameters should be minimal. Under these 
restrictions there are not many possibilities to model the 
stochastic process. We propose a process [l5| that is de- 
fined by the recursive formula for the discrete probability 
distribution function (PDF) Pn\x with binomial transfer 
probabilities given as jl6|, 

Pn\x = Pn-l\x-\-l ■ [2 ^ (^ + 1)^] 

+ Pn-l\x-l-[l + {^'m (1) 

where n e {0, 1, 2, • • • , N} and we define stochastic dis- 
placements from the origin by the accumulation of the 
+ls and the —Is. Each event is indexed as n for the en- 
tire N events. The displacement x runs from — n to n in 
steps of 2 i.e. x € {—n , —n -\- 2 , —n -|- 4 , • • • ,n — 2 ,n} 
with n + I elements and we start with PqIo = 1- For x 
which is not in this range Pn\x = 0. Probability distribu- 
tions for some values of the coupling e are illustrated in 
Fig. [1]. We introduce the generating function for the an- 
alytic study of the process as Zn{q) = J2l=--^ Pn\x<f = 
2 YZ=-nPn\x<f , where Z„(l) = 2 EL-n^nk = ^ ' ^he 
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jj:, hence the more fine-grained the time lattice is, the 
smaller the coupling becomes. By this renormalization, 
the convergence is guaranteed (as one can infer from Fig. 
[T]) and the interpretation of the auto-correlation formula 
(fTU)) given below becomes natural. Instead of e, we take k 
as the control parameter of the model, which ranges from 
— 2 to 2- Cumulants or n-point correlations are calcu- 
lated from derivatives of the generating function ^ . As 
expected, the average and the skewness are zero. The 
variance and the 4-point correlation are calculated to be 



FIG. 1: Probability distribution functions P„|^ defined in 
equation ^ with N = 40 (left) and iV = 100 (right). The 
values of e are chosen to have e = i:OA/N. The Gaussian 
distribution where e = is also shown for comparison. 



distribution is obtained simply by reading off q^ coeffi- 
cients from the generating function , and the boundary 
conditions are Zo{q) = 1 and Zi{q) = ^{q + q~^) ■ In the 
Gaussian limit the moment generating function is given 
by lime_>.o Z^iq) = ^{q + Q~^)^ ■ Then the recurrence 
([1]) is recast into a differential equation for q, 



Zn+l{q) ^ -{q + q" 



We make a substitution that 
l)-^y„(g) for alln. Then 

r„+i(<z) = (g2_i)a^r„(g) 



Zn{q) + <Q^ - ^)dgZ.niq) . (2) 
Zniq) = e^q^iq^ - 



withYo{q) = {q-^y. 



Further we substitute 
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tanh (q) . 
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then 



l-fe^ 



-tanh(r), YQ{q{r)) 



sinh(2r) 



In this coordinate, we have 
jY /sinh(2r) ~ 



sinh(2r)^ 



^.(.M) = .^(=^)"[5.]-(=^) 



(5) 



Solving A^-th derivative, the generating function is ob- 
tained in the closed form as , 
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{ - 4(2 - kf + 3(4 - k)^ + i-k)^ + 
3k[2{2 - kf - (4 - k)^ - (-fc)^]}/(2 + 4fce) 
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Now we present the analytic evidence of the convergence 
by showing the convergence of cumulants at the large N 
limit. It is calculated analytically in the small k region 
and for the convergence at high k, we resort to numeri- 
cal simulation. Taking the coupling e as the renormalized 
one jj- and making use of the results ([7]) and ([8]), we cal- 
culate the kurtosis up to second order in k. Nontrivial 
cancellations occur in each coefficient of k and k^ . Con- 
sequently in the large N limit the kurtosis converges to 
3 for the sufficiently small k. 
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where the binomial function is defined as the product 
{H'')=nl-o\i,+N-^)/Nl. 

Convergence : Cumulants and Auto- Correlation . 
The coupling e is a fixed constant at all times. We in- 
troduce the renormalization of the coupling e — > e^v = 



To confirm the convergence, we numerically calculate the 
('g'jkurtosis. The measured kurtosis is shown in Fig. [5] as 
a function of N, corroborating the analytic result ([9]) 
at the large TV limit. Here we present the size of auto- 
correlation with generic time lag L—l with L = 2 , 3 , • • • . 
The displacement at the level n + 1 is denoted as A^n+i 
to have value ±1 and ASn+L to have alternating values 
of ±1, but they are weighted by the probabilities defined 
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FIG. 2: Variance and kurtosis as a function of A'^ with k — 0.4 
(left) and k = —0.4 (right). The values of the variance are 
divided by TV. 
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by ([U . The result is 
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where H = (1 + 2k + |k^ + 0{k^)) . Note that n 
and X parametrize time and displacement. The auto- 
correlation has no n dependence therefore the process is 
stationary for changes in time. Moreover a stochastic 
trajectory at large displacement is shown to be highly 
correlated, which is consistent with our intuition. The 
second equality not only exhibits the convergence for the 
auto-correlation but also clarifies the origin of the Hurst 
coefficients jl2| whose inner dynamics had been unclear. 
The auto-correlation is well defined in the large N limit, 
when the result ([7]) is incorporated. The variance numer- 
ically calculated in Fig. [2] is also consistent with Eq. ([7]) . 
As briefly mentioned in the introduction, the Hurst coef- 
ficient h is defined from the displacement in terms of the 
exponent of the time lag i.e. Ax = 2{\fx?) ^ 2N'^. In 
our theory h is read from Ax ~ 2y/N ■ iJ, and the value 
of h is simply 1/2 in the Gaussian case since Ax ~ 2VN . 
In Eq. (fTO|) . the auto-correlation formula is recast in 



terms of H, which means that the Hurst coefficient h can 
be determined from the micro dynamics of the stochas- 
tic process. It is one of the novel features in our the- 
ory that differs from the conventional formulae based on 
FBM [li|. 

Heavy-tail : Regime switching |17J | . - We construct the 
leptokurtic-heavy-tail distribution profile from the mem- 
ory effect. The PDF of a process with negative k has a 
sharp peak in the center but falls to zero very rapidly, 
while the PDF with positive k has slowly falling tails but 
a lower peak in the center than that with negative k, as 
illustrated in Fig. [TJ 

Regardless of signs of the auto-correlation, the kurtosis 
converges to 3 in case of the monotonic memory process 



FIG. 3: The probability distribution of log return data of 
the Dow Jones daily index. The dotted and dashed lines 
are predictions from Gaussian and stable distributions respec- 
tively. The solid line is our model with the best fit parame- 
ters, 6 = 0.38, 5 = 13.8 and k = 3.0. Note that the PDFs of 
the data and predictions from Gaussian distribution and our 
model are scaled to ct = 1, while that from stable distribution 
is scaled so that fitting to the data becomes best. 



defined by ([T]). Such inflexibility is not desirable when we 
consider general applications. Inspired by regime switch- 
ing mechanism, [l3| , we combine two stochastic processes 
with different signs of the autocorrelation , that allows 
us to arrive at the heavy-tail distribution with flexible 
values of kurtosis. As an example, we examine a regime 
switching behavior of the Gaussian profile with x depen- 
dence such as, e -^ e{b — e"^ /^* ) with & > 0. Under 
this mechanism, a specific random walk flips the sign of 
its auto-correlation to be positive in the occasion of de- 
viation from the center regime thereby the trail tends to 
accelerate to the marginal displacement. Remaining in 
the center regime on the other hand, the trajectory is 
negatively correlated and tends to concentrates increas- 
ingly to the center. Making use of this model, we show 
a proper quantitative match to the Dow Jones data |l8| . 
In Fig. [3] we illustrate the log-scaled probability distri- 
bution of the log return, As/s = (s^+i — Si)/si, where 
Si = s{ti) is the stock price at a given day ti. The PDF 
is normalized to have the variance a = 1. Obviously the 
data cannot be explained by the Gaussian distribution 
which is depicted as the dotted line. In comparing the 
stock data with our model prediction, we calculate x^ 
statistics for logarithmic values of the PDF by adopting 
b, S and k as free parameters. Since we are interested in 
the heavy-tail behaviors which starts at As/s « 2, the 
fit is made for the range 1.5 < As/s < 10. Our results 
are, however, not largely affected by this choice. The 
black solid line shown in Fig. [3] is our prediction with 
the best fit parameters, (6, (5, k) = (0.38,13.8,3.0). Un- 
like the monotonic memory process ([T]), k can be large as 



long as the transfer probability is between and 1 [19|. 
In this data analysis, the location of a; = As/s where 
the transfer probability starts to be ill-defined is around 
As/s = 7. The majority of the data points fall in the 
well defined region. 

We also compare the performance of our model with 
the existent heavy-tail theory i. e. Levy theory. The blue 
dashed line is the a-stable distribution. At a glance the 
stable distribution seems to have a better fit to the heavy- 
tail data, but there are several problems. Stable distri- 
butions generically have divergent cumulants due to its 
strict power-law behavior at high As/s and it is impos- 
sible to compute the variance, which is one of the most 
important statistics. Therefore we have no choice but to 
treat the scaling parameter as a free parameter, which 
means we choose the scaling such that we necessarily ob- 
tain the best fit. Nevertheless, stable distribution fails 
to explain the PDF data at low As/s. On the contrary, 
we normalize the variance to be unity for our model to 
compare fairly to the data, and it precisely predicts the 
data at low As/s. A tempered-stable distribution [7| 
can be used to cure the defect of a stable distribution, 
by artificially assuming an exponentially-decaying regu- 
lator distribution function at the asymptotic region. On 
the other hand our model can naturally comprise such 
a regulating behavior from the physical mechanism i.e. 
regime switching. 

Discussion. - From a simple micro process with the 
memory effect we have derived a class of analytically 
solvable distributions of good flexibility controlled by the 
single auto-correlation parameter n. The resulting distri- 
bution has an exponential tail but it can be made heav- 
ier around the intermediate region. The PDF with a 
closed form has to be calculated by performing an in- 
verse Laplace transform or fast Fourier transform to the 
moment generating function ([6]), and this will be pre- 
sented in future works. We studied an application to the 
real financial data so as to demonstrate the accuracy of 
our model. Issues on convergence and renormalization 
should be clarified for the regime switching model with 
further analytic study, however our numerical analysis 
provides an important first step. 

Both non-Markovian (FBM) and Markovian non- 
Gaussian (Levy) models have been extensively developed, 
however the non-Markovian non-Gaussian framework has 
received little attention in previous studies. Through 
this letter, we propose the analytically solvable non- 
Markovian non-Gaussian model. We also numerically re- 
produced the PDF of the real data which has heavy-tail. 
Thus we have shown that heavy-tail indeed is derived 
from memory. 
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It is a game of tossing a flexible coin that remembers 
the previous outcome and bends itself by the result at 
each process. For example, when the outcome is turned 
out to be the head it bents itself toward the tail in a 
certain degree and the process will be positively auto- 
correlated. The size of these bents are denoted as e which 
is going to be matched to the size of auto-correlation in 
the stochastic process. 

This stochastic process does not go in the class of Urn- 
process implied by the /3-distribution [ij] though it has 
close resemblance. Novel feature is the use of the renor- 
malized auto-correlation, which enables one to arrive at 
the convergent theory. 
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stochastic process with thresholds. 

Log return of the Dow Jones daily index from 1928.10.1 
to 2012.1.23 shown as dotted line in the flgure 
In other words, the process is well defined as long as the 
inequality {x - l)n/N{b - Exp{-x^ /25'^)) < 1/2 holds. 



